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r 1. Document ID: US 6233575 B1 

L4: Entry 1 of 2 File: USPT 



May 15, 2001 



US-PAT-NO: 6233575 

DOCUMENT-IDENTIFIER: US 6233575 Bl 

TITLE: Multilevel taxonomy based on features derived from training documents 
classification using fisher values as discrimination values 

DATE-ISSUED: May 15, 2001 

INVENTOR- IN FORMAT I ON : 

NAME CITY STATE ZIP CODE COUNTRY 

Agrawal; Rakesh 
Chakrabarti; Soumen 
Dom; Byron Edward 
Raghavan; Prabhakar 
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ASSIGNEE-IN FORMAT I ON : 
NAME 

International Business Machines 
Corporation 



CITY STATE ZIP CODE COUNTRY TYPE CODE 
ArmonkNY 02 



APPL-NO: 09/102861 [PALM ] 
DATE FILED: June 23, 1998 



PARENT-CASE: 

PROVISIONAL APPLICATION The present application claims the benefit of U.S. 
Provisional Application Ser. No. 60/050,611, entitled "USING TAXONOMY, 
DISCRIMINANTS, AND SIGNATURES FOR NAVIGATING IN TEXT DATABASES", filed Jun . 24, 
1997, by Rakesh Agrawal, et al., which is incorporated herein by reference, in its 
entirety. 

INT-CL-ISSUED: [07] G06F 17/30 

INT-CL-CURRENT: 

TYPE IPC DATE 
CIPP G06 F 17/30 20060101 



US-CL-ISSUED: 707/6; 707/2, 706/12 
US-CL-CURRENT: 707/6; 706/12, 707/2, 707 / E17.091 
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November 1996 


EP 
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OTHER PUBLICATIONS 

Ho, T.K. et al., decision combination in multiple classifier systems, IEEE 
transactions on patte'rn analysis and machine intelligence, vol. 16, No. 1, pp 66^ 
75, Jan. 1994.* 

Soumen Chakrabarti et al . , Enhanced hypertext categorization using hyperlinks, 
proceedings of ACM SIGMOD international conference on Management of data, and 307- 
318, Jun. 1998.* 

Yuwono, B et al., search and ranking algorithms for locating resources on world 
wide web, proceedings of the 12th international conference, pp 164-171, Mar. 1996.* 

Hill, P. et al., "Multiple Views of Product Information", IBM Technical Disclosure 
Bulletin, vol. 39, No. 02, pp. 17-24 (Feb. 1996). 

Rus, D. et al., "Using Non-Textual Cues for Electronic Document Browsing", Digital 
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Libraries Workshop DL '94, Newark, NJ, USA, May 19-20, 1994 Selected Papers, 
Chapter 9, pp. 129-162. 

Koller, D. et al., "Hierarchically Classifying Documents Using Very Few Words", The 
Fourteenth International Conference on Machine Learning, pp. 170-178 (Jul. 1997). 
Mladenic D., "Feature Subset Selection in Text-Learning", lO.sup.th European 
Conference on Machine Learning, pp. 95-100, (1998) , 

Yang, Y. et al., "A Comparative Study on Feature Selection in Text Categorization", 
International Conference on Machine Learning, pp. 412-420 (Jul. 1997). 
Apte, C. et al., "Automated Learning of Decision Rules for Text Categorization", 
IBM Research Report RC 18879. To Appear in ACM Transactions on Information Systems, 
pp. 1-20 (no date).; vol. 12, Issue 3, accepted Mar. 1994. 

Schutze, H. et al., "A Comparison of Classifiers and Document Representations for 
the Routing Problem", Proceedings of the 18.sup.th Annual International ACM SIGIR 
Conference on Research and Development in Information Retrieval, pp. 229-237 (Jul. 
1995). 

Lewis, D., "Evaluating Text Categorization", Proceedings of the Speech and Natural 
Language Workshop, Asilomar, pp. 312-318 (Feb. 1991) . 

Lewis, D., "Feature Selection and Feature Extraction for Text Categorization", 
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York 
pp. 212-217 (Feb. 1992) . 

Koller, D., "Toward Optimal Feature Selection", In Lorenza Saitta, ed.. Machine 
Learning: Proc. Of the Thirteenth International Conference, Morgan Kaufmann, 9 
pages, (1996) . 

Panyr, J., "STEINADLER — a system of automatic description and classification of 
documents", Nachr. Dok, vol. 29, No. 4-5, pp. 184-191 (Sep. 1978) (Abstract in 
English) . Abstract in English Only Considered. 

ART-UNIT: 217 

PRIMARY-EXAMINER: Breene; John 
ASSISTANT-EXAMINER: Channava j j ala ; Srirama 
ATT Y-AGENT- FIRM: Gates & Cooper LLP 



ABSTRACT: 

A system, process, and article of manufacture for organizing a large text database 
into a hierarchy of topics and for maintaining this organization as documents are 
added and deleted and as the topic hierarchy changes. Given sample documents 
belonging to various nodes in the topic hierarchy, the tokens (terms, phrases, 
dates, or other usable feature in the document) that are most useful at each 
internal decision node for the purpose of routing new documents to the children of 
that node are automatically detected. Using feature terms, statistical models are 
constructed for each topic node. The models are used in an estimation technique to 
assign topic paths to new unlabeled documents. The hierarchical technique, in which 
feature terms can be very different at different nodes, leads to an efficient 
context-sensitive classification technique. The hierarchical technique can handle 
millions of documents and tens of thousands of topics. A resulting taxonomy and 
path enhanced retrieval system (TAPER) is used to generate context-dependent 
document indexing terms. The topic paths are used, in addition to keywords, for 
better focused searching and browsing of the text database. 

32 Claims, 10 Drawing figures 
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r 2. Document ID: US 6038561 A 

L4: Entry 2 of 2 File: USPT Mar 14, 2000 

US-PAT-NO: 6038561 

DOCUMENT-IDENTIFIER: US 6038561 A 

** See image for Certificate of Correction ** 

TITLE: Management and analysis of document information text 

DATE-ISSUED: March 14, 2000 

I NVENTOR- 1 N FORMAT I ON : 

NAME CITY ' STATE ZIP CODE COUNTRY 

Snyder; David L. Pittsford NY 

Calistri-Yeh; Randall J. Webster NY 



ASSIGNEE-INFORMATION: 

NAME CITY STATE ZIP CODE COUNTRY TYPE CODE 

Manning & Napier Information Services Rochester NY ^ 02 

APPL-NO: 08/929603 [PALM] 
DATE FILED: September 15, 1997 

PARENT-CASE: 

This application claims the benefit of U.S. Provisional Application No. 60/028,437, 
filed Oct. 15, 1996, the full disclosure of which is incorporated by reference. 
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority from the 
following U.S. Provisional Application, the disclosure of which, including all 
appendices and all attached documents, is incorporated by reference in its entirety 
for all purposes: U.S. Provisional Patent Application, serial no. 60/028,437, David 
L. Snyder and Randall J. Calistri-Yeh, entitled, "Management and Analysis of Patent 
Information Text (MAPIT)", filed'Oct. 15, 1996. Further, this application 
incorporates by reference the following U.S. patent applications in their entirety 
for all purposes: U.S. patent application Ser. No. 08/696,702, pending Elizabeth D. 
Liddy, et . al. entitled, "User Interface and Other Enhancements for Natural 
Language Information Retrieval System and Method", filed Aug. 14, 1996; and U.S. 
Provisional Patent Application, serial no. 60/042,295, Michael L. Weiner and John 
J. Kolb v., entitled, "Method and Apparatus for Automatic Extraction and Graphic 
Visualization of Textual Information", filed Apr. 1, 1997. CROSS-REFERENCE TO 
ARTICLES Further, this application incorporates by reference the following articles 
in their entirety for all purposes: Liddy, E. D., Paik, W. , Yu, E. S. & McVearry, 
K., "An overview of DR-LINK and its approach to document filtering," Proceedings of 
the ARPA Workshop on Human Language Technology (1993); Liddy, E. D. & Myaeng, S. H. 

(1994) , DR-LINK System: Phase I Summary. Proceedings of the TIPSTER Phase I Final 
Report. Liddy, E. D., Paik, W., Yu, E. S. & McKenna, M. (1994). Document retrieval 
using linguistic knowledge. Proceedings of RIAO '94 Conference. Liddy, E. D., Paik, 
W., Yu, E. S. Text categorization for multiple users based on semantic information 
from an MRD. ACM Transactions on Information Systems. Publication date: 1994. 
Presentation date: July, 1994. Liddy, E. D., Paik, W., McKenna, M. & Yu, E. S. 

(1995) A natural language text retrieval system with relevance feedback. 
Proceedings of the 16th National Online Meeting, Paik, W., Liddy, E. D., Yu, E. S. 

& McKenna, M. Categorizing and standardizing proper nouns for efficient information 
retrieval. Proceedings of the ACL Workshop on Acquisition of Lexical Knowledge from 
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Text. Publication date: 1993. Paik, W., Liddy, E. D., Yu, E. S. & McKenna, M. 
Interpretation of Proper Nouns for Information Retrieval. Proceedings of the ARPA 
Workshop on Human Language Technology. Publication date: 1993. Salton, G. and 
Buckley, C. Term-weighting Approaches in Automatic Text Retrieval. Information 
Processing and Management. Volume 24, 513-523. Publication date: 1988 ("Salton 
reference") . 

INT-CL-ISSUED: [07] G06F 17/30 

INT-CL-CURRENT: 

TYPE IPC DATE 

CIPP G06 F 17/30 20060101 

US-CL-ISSUED: 707/6; 707/2, 707/3, 707/10, 707/522 

US-CL-CURRENT: 707/6; 707/10, 707/2, 707/3, 707 / E17.08 , 707 / E17.093 , 715/522 

FIELD-OF-CLASSIFICATION-SEARCH: 707/9, 707/10, 707/6, 707/1, 707/2, 707/3, 707/4, 

707/5 

See application file for complete search history. 
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US-CL 


4736308 


April 1988 


Heckel 


364/200 


4788538 


November 1988 


Klein et al. 
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5433199 


July 1995 


Cline et al. 


128/653 
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Pedersen et al. 
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5483650 


January 1996 


Pedersen et al . 
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Gilham 
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April 1997 


Rivette et al. 
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5623681 


April 1997 


Rivette et al. 
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5675819 


October 1997 


Schuetze 


395/760 


5696963 


December 1997 


Ahn 
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May 1998 


Rivette et al. 
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5761661 


June 1998 


Coussens et al. 
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5787422 


July 1998 


Tukey et al. 


707/5 


5799325 
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Rivette et al. 


707/579 


5806079 


September 1998 


Rivette et al. 


707/500 


5809318 


September 1998 


Rivette et al. 


707/2 


5845301 


December 1998 


Rivette et al. 


• 707/5 


5848409 


December 1998 


Ahn 


707/526 



FOREIGN PATENT DOCUMENTS 



FOREIGN-PAT-NO PUBN-DATE COUNTRY CLASS 

0561241 A2 September 1993 EP 15/40 



http://jupiter1:9000/bin/gate.exe?f=TOC&state=vg7vus.5&ref=4&dbname=USPT&... 10/12/07 



Refcord List Display 



Page 6 of 7 



OTHER PUBLICATIONS 

Liddy, E.D., Paik, W., Yu, E.S. & McVearry, K., "An overview of DR-LINK and its 
approach to document filtering," Proceedings of the ARPA Workshop on Human Language 

Technology (1993) . 

Liddy, E.D. & Myaeng, S.H., "DR-LINK System: Phase I Summary," Proceedings of the 
TIPSTER Phase I Final Report, (1994). 

Liddy, E.D., Paik, W., Yu, E.S, & McKenna, M., "Document retrieval using linguistic 
knowledge." Proceedings of RIAO '94 Conference, (1994). 

Liddy, E.D., Paik, W., Yu, E.S., "Text categorization for multiple users based on 
semantic information from an MRD." ACM Tranisactions on Information Systems. 
Publication date: 1994. Presentation date: (1994). 

Liddy, E.D., Paik, W., McKenna, M. & Yu, E.S., "A natural language text retrieval 
system with relevance feedback." Proceedings of the 16th National Online Meeting, 
(1995) . 

Paik, W., Liddy, E.D., Yu, E.S. & McKenna, M., "Categorizing and standardizing 
proper nouns for efficient information retrieval." Proceedings of the ACL Workshop 
on Acquisition of Lexical Knowledge from Text, (1993) . 

Paik, W., Liddy, E.D., Yu, E.S. & McKenna, M., "Interpretation of Proper Nouns for 
Information Retrieval." Proceedings of the ARPA Workshop on Human Language 
Technology, (1993) . 

Salton, G. and Buckley, C. "Term-weighting Approaches in Automatic Text Retrieval." 
Information Processing and Management, vol." 24, 513-523. Publication date: (1988) 
("Salton reference") . 

ART-UNIT: 277 

PRIMARY-EXAMINER: Arasbury; Wayne 

ASSISTANT-EXAMINER: Alam; Shahid 

ATTY-AGENT-FIRM: Townsend and Townsend and Crew LLP 



ABSTRACT : 

An interactive system for analyzing and displaying information contained in a 
plurality of documents employing both term-based analysis and conceptual- 
representation analysis. Particulars of the invention are especially effective for 
analyzing patent texts, such as patent claims, abstracts and other portions of a 
patent document. 

68 Claims, 35 Drawing figures 
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